We study an approach to policy selection for large relational Markov DecisionProcesses (MDPs). We consider a variant of approximate policy iteration (API)that replaces the usual value-function learning step with a learning step inpolicy space. This is advantageous in domains where good policies are easier torepresent and learn than the corresponding value functions, which is often thecase for the relational MDPs we are interested in. In order to apply API tosuch problems, we introduce a relational policy language and correspondinglearner. In addition, we introduce a new bootstrapping routine for goal-basedplanning domains, based on random walks. Such bootstrapping is necessary formany large relational MDPs, where reward is extremely sparse, as API isineffective in such domains when initialized with an uninformed policy. Ourexperiments show that the resulting system is able to find good policies for anumber of classical planning domains and their stochastic variants by solvingthem as extremely large relational MDPs. The experiments also point to somelimitations of our approach, suggesting future work.
展开▼